Adopting multiple vision transformer layers for fine-grained image representation

نویسندگان

چکیده

Abstract Accurate discriminative regions proposal has an important effect for fine-grained image recognition. The vision transformer (ViT) brings about a striking in computer duo to its innate muti-head self-attention mechanism. However, the attention maps are gradually similar after certain layers and since ViT adds classification token perform classification, it is unable effectively select patches classification. To accurately detect regions, we propose novel network AMTrans, which efficiently increases learn diverse features utilizes integrated raw capture more salient feature. Specifically, employ DeepViT as backbone solve collapse issue. Then, fuse each head weight within layer produce map. After that, alternatively use recurrent residual refinement blocks promote feature detection then utilize semantic grouping method region. A lot of experiments prove that AMTrans acquires SOTA performance on three widely used datasets under same settings, involving Stanford-Cars, Stanford-Dogs CUB-200-2011.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DeepKSPD: Learning Kernel-matrix-based SPD Representation for Fine-grained Image Recognition

Being symmetric positive-definite (SPD), covariance matrix has traditionally been used to represent a set of local descriptors in visual recognition. Recent study shows that kernel matrix can give considerably better representation by modelling the nonlinearity in the local descriptor set. Nevertheless, neither the descriptors nor the kernel matrix is deeply learned. Worse, they are considered ...

متن کامل

Fine-Grained Function Visibility for Multiple Dispatch with Multiple Inheritance

Object-oriented languages with multiple dispatch and multiple inheritance provide rich expressiveness but statically and modularly checking programs in such languages to guarantee that no ambiguous calls can occur at run time has been a difficult problem. We present a core calculus for Fortress, which provides various language features—notably functional methods and components— and solves the p...

متن کامل

Paraphrasing of Synonyms for a Fine-grained Data Representation

The paper addressed the question how the paraphrasing of synonyms can be linked with a fine-gained ontology based data representation. Our challenge is to identify for a set of synonyms (including terms and multiword expressions) the best lexical paraphrases suitable for given contexts. Our hypothesis is that: i. the minimal context in which the paraphrasing can be validated is different for di...

متن کامل

Multidimensional interactive fine-grained image retrieval

We propose an image retrieval methodology for a collection of similar images. By similar, we mean that one can define, for the collection, a set of dimensions, and for each of which a set of features. The dimensions are used to capture the essential characteristics of the images in the collection, and the features are for describing each image to a certain degree. We call this strategy fine-gra...

متن کامل

Weakly Supervised Fine-Grained Image Categorization

In this paper, we categorize fine-grained images without using any object / part annotation neither in the training nor in the testing stage, a step towards making it suitable for deployments. Fine-grained image categorization aims to classify objects with subtle distinctions. Most existing works heavily rely on object / part detectors to build the correspondence between object parts by using o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of physics

سال: 2023

ISSN: ['0022-3700', '1747-3721', '0368-3508', '1747-3713']

DOI: https://doi.org/10.1088/1742-6596/2595/1/012004